Approximate Modified Policy Iteration

نویسندگان

  • Bruno Scherrer
  • Victor Gabillon
  • Mohammad Ghavamzadeh
  • Matthieu Geist
چکیده

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of well-known approximate DP algorithms: fitted-value iteration, fittedQ iteration, and classification-based policy iteration. We provide error propagation analysis that unifies those for approximate policy and value iteration. For the classificationbased implementation, we develop a finitesample analysis that shows that MPI’s main parameter allows to control the balance between the estimation error of the classifier and the overall value function approximation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Solving time-fractional chemical engineering equations by modified variational iteration method as fixed point iteration method

The variational iteration method(VIM) was extended to find approximate solutions of fractional chemical engineering equations. The Lagrange multipliers of the VIM were not identified explicitly. In this paper we improve the VIM by using concept of fixed point iteration method. Then this method was implemented for solving system of the time fractional chemical engineering equations. The ob...

متن کامل

Approximate modified policy iteration and its application to the game of Tetris

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are exten...

متن کامل

Optimization of Solution Regularized Long-wave Equation by Using Modified Variational Iteration Method

In this paper, a regularized long-wave equation (RLWE) is solved by using the Adomian's decomposition method (ADM) , modified Adomian's decomposition method (MADM), variational iteration method (VIM), modified variational iteration method (MVIM) and homotopy analysis method (HAM). The approximate solution of this equation is calculated in the form of series which its components are computed by ...

متن کامل

Non-Stationary Approximate Modified Policy Iteration

We consider the infinite-horizon γ-discounted optimal control problem formalized by Markov Decision Processes. Running any instance of Modified Policy Iteration—a family of algorithms that can interpolate between Value and Policy Iteration—with an error at each iteration is known to lead to stationary policies that are at least 2γ (1−γ)2 -optimal. Variations of Value and Policy Iteration, that ...

متن کامل

Some New Existence, Uniqueness and Convergence Results for Fractional Volterra-Fredholm Integro-Differential Equations

This paper demonstrates a study on some significant latest innovations in the approximated techniques to find the approximate solutions of Caputo fractional Volterra-Fredholm integro-differential equations. To this aim, the study uses the modified Adomian decomposition method (MADM) and the modified variational iteration method (MVIM). A wider applicability of these techniques are based on thei...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1205.3054  شماره 

صفحات  -

تاریخ انتشار 2012